Elementary symmetric polynomial

In mathematics, specifically in commutative algebra, the elementary symmetric polynomials are one type of basic building block for symmetric polynomials, in the sense that any symmetric polynomial P can be expressed as a polynomial in elementary symmetric polynomials: P can be given by an expression involving only additions and multiplication of constants and elementary symmetric polynomials. There is one elementary symmetric polynomial of degree d in n variables for any d ≤ n, and it is formed by adding together all distinct products of d distinct variables.

Contents

Definition

The elementary symmetric polynomials in n variables X1, …, Xn, written ek(X1, …, Xn) for k = 0, 1, ..., n, can be defined as

\begin{align}
  e_0 (X_1, X_2, \dots,X_n) &= 1,\\
  e_1 (X_1, X_2, \dots,X_n) &= \textstyle\sum_{1 \leq j \leq n} X_j,\\
  e_2 (X_1, X_2, \dots,X_n) &= \textstyle\sum_{1 \leq j < k \leq n} X_j X_k,\\
  e_3 (X_1, X_2, \dots,X_n) &= \textstyle\sum_{1 \leq j < k < l \leq n} X_j X_k X_l,\\
\end{align}

and so forth, down to

 e_n (X_1, X_2, \dots,X_n) = X_1 X_2 \ldots X_n

(sometimes the notation σk is used instead of ek). In general, for k ≥ 0 we define

 e_k (X_1 , \ldots , X_n )=\sum_{1\le  j_1 < j_2 < \ldots < j_k \le n} X_{j_1} \ldots X_{j_k}.

Thus, for each positive integer k, less than or equal to n, there exists exactly one elementary symmetric polynomial of degree k in n variables. To form the one which has degree k, we form all products of k-tuples of the n variables and add up these terms.

The fact that X_1X_2=X_2X_1 and so forth is the defining feature of commutative algebra. That is, the polynomial ring formed by taking all linear combinations of products of the elementary symmetric polynomials is a commutative ring.

Examples

The following lists the n elementary symmetric polynomials for the first four positive values of n. (In every case, e0 = 1 is also one of the polynomials.)

For n = 1:

e_1(X_1) = X_1.\,

For n = 2:

\begin{align}
 e_1(X_1,X_2)  &= X_1 %2B X_2,\\  
 e_2(X_1,X_2) &= X_1X_2.\,\\
\end{align}

For n = 3:

\begin{align}
 e_1(X_1,X_2,X_3) &= X_1 %2B X_2 %2B X_3,\\ 
 e_2(X_1,X_2,X_3) &= X_1X_2 %2B X_1X_3 %2B X_2X_3,\\
 e_3(X_1,X_2,X_3) &= X_1X_2X_3.\,\\
\end{align}

For n = 4:

\begin{align}
 e_1(X_1,X_2,X_3,X_4) &= X_1 %2B X_2 %2B X_3 %2B X_4,\\
 e_2(X_1,X_2,X_3,X_4) &= X_1X_2 %2B X_1X_3 %2B X_1X_4 %2B X_2X_3 %2B X_2X_4 %2B X_3X_4,\\
 e_3(X_1,X_2,X_3,X_4) &= X_1X_2X_3 %2B X_1X_2X_4 %2B X_1X_3X_4 %2B X_2X_3X_4,\\
 e_4(X_1,X_2,X_3,X_4) &= X_1X_2X_3X_4.\,\\
\end{align}

Properties

The elementary symmetric polynomials appear when we expand a linear factorization of a monic polynomial: we have the identity

\prod_{j=1}^n ( \lambda-X_j)=\lambda^n-e_1(X_1,\ldots,X_n)\lambda^{n-1}%2Be_2(X_1,\ldots,X_n)\lambda^{n-2}-\cdots%2B(-1)^n e_n(X_1,\ldots,X_n).

That is, when we substitute numerical values for the variables X_1,X_2,\dots,X_n, we obtain the monic univariate polynomial (with variable λ) whose roots are the values substituted for X_1,X_2,\dots,X_n and whose coefficients are the elementary symmetric polynomials.

The characteristic polynomial of a linear operator is an example of this. The roots are the eigenvalues of the operator. When we substitute these eigenvalues into the elementary symmetric polynomials, we obtain the coefficients of the characteristic polynomial, which are numerical invariants of the operator. This fact is useful in linear algebra and its applications and generalizations, like tensor algebra and disciplines which extensively employ tensor fields, such as differential geometry.

The set of elementary symmetric polynomials in n variables generates the ring of symmetric polynomials in n variables. More specifically, the ring of symmetric polynomials with integer coefficients equals the integral polynomial ring \mathbb Z[e_1(X_1,\ldots,X_n),\ldots,e_n(X_1,\ldots,X_n)]. (See below for a more general statement and proof.) This fact is one of the foundations of invariant theory. For other systems of symmetric polynomials with a similar property see power sum symmetric polynomials and complete homogeneous symmetric polynomials.

The fundamental theorem of symmetric polynomials

For any commutative ring A denote the ring of symmetric polynomials in the variables  X_1,\ldots,X_n with coefficients in A by  A[X_1,\ldots,X_n]^{S_n} .

 A[X_1,\ldots,X_n]^{S_n} is a polynomial ring in the n elementary symmetric polynomials  e_k (X_1 , \ldots ,X_n ) for k = 1, ..., n.

(Note that e_0 is not among these polynomials; since e_0=1, it cannot be member of any set of algebraically independent elements.)

This means that every symmetric polynomial  P(X_1,\ldots, X_n) \in 
A[X_1,\ldots,X_n]^{S_n} has a unique representation

 P(X_1,\ldots, X_n)=Q(e_1(X_1 , \ldots ,X_n), \ldots, e_n(X_1 , \ldots ,X_n))

for some polynomial  Q \in A[Y_1,\ldots,Y_n] . Another way of saying the same thing is that  A[X_1,\ldots,X_n]^{S_n} is isomorphic to the polynomial ring A[Y_1,\ldots,Y_n] through an isomorphism that sends Y_k to e_k(X_1 , \ldots ,X_n) for k=1,\ldots,n.

Proof sketch

The theorem may be proved for symmetric homogeneous polynomials by a double mathematical induction with respect to the number of variables n and, for fixed n, with respect to the degree of the homogeneous polynomial. The general case then follows by splitting an arbitrary symmetric polynomial into its homogeneous components (which are again symmetric).

In the case n = 1 the result is obvious because every polynomial in one variable is automatically symmetric.

Assume now that the theorem has been proved for all polynomials for  m < n variables and all symmetric polynomials in n variables with degree < d. Every homogeneous symmetric polynomial P in  A[X_1,\ldots,X_n]^{S_n} can be decomposed as a sum of homogeneous symmetric polynomials

 P(X_1,\ldots,X_n)= P_{\mbox{lacunary}} (X_1,\ldots,X_n)  %2B X_1 \cdots X_n \cdot Q(X_1,\ldots,X_n).

Here the "lacunary part"  P_{\mbox{lacunary}} is defined as the sum of all monomials in P which contain only a proper subset of the n variables X1, ..., Xn, i.e., where at least one variable Xj is missing.

Because P is symmetric, the lacunary part is determined by its terms containing only the variables X1, ..., Xn−1, i.e., which do not contain Xn. These are precisely the terms that survive the operation of setting Xn to 0, so their sum equals P(X_1, \ldots,X_{n-1},0) , which is a symmetric polynomial in the variables X1, ..., Xn−1 that we shall denote by  \tilde{P}(X_1, \ldots, X_{n-1}). By the inductive assumption, this polynomial can be written as

 \tilde{P}(X_1, \ldots, X_{n-1})=\tilde{Q}(\sigma_{1,n-1}, \ldots, \sigma_{n-1,n-1})

for some  \tilde{Q}. Here the doubly indexed  \sigma_{j,n-1} denote the elementary symmetric polynomials in n−1 variables.

Consider now the polynomial

 R(X_1, \ldots, X_{n}):= \tilde{Q}(\sigma_{1,n}, \ldots, \sigma_{n-1,n}) \ .

Then R(X_1, \ldots, X_{n}) is a symmetric polynomial in X1, ..., Xn, of the same degree as  P_{\mbox{lacunary}}, which satisfies

R(X_1, \ldots, X_{n-1},0) = \tilde{Q}(\sigma_{1,n}, \ldots, \sigma_{n-1,n}) = P(X_1, \ldots,X_{n-1},0)

(the first equality holds because setting Xn to 0 in \sigma_{j,n} gives \sigma_{j,n-1}, for all j<n), in other words, the lacunary part of R coincides with that of the original polynomial P. Therefore the difference PR has no lacunary part, and is therefore divisible by the product  X_1 \cdots X_n of all variables, which equals the elementary symmetric polynomial \sigma_{n,n}. Then writing P-R=\sigma_{n,n}\,Q, the quotient Q is a homogeneous symmetric polynomial of degree less than d (in fact degree at most dn) which by the inductive assumption can be expressed as a polynomial in the elementary symmetric functions. Combining the representations for PR and R one finds a polynomial representation for P.

The uniqueness of the representation can be proved inductively in a similar way. (It is equivalent to the fact that the n polynomials  e_1, \ldots, e_n are algebraically independent over the ring A.) The fact that the polynomial representation is unique implies that  A[X_1,\ldots,X_n]^{S_n} is isomorphic to  A[Y_1,\ldots,Y_n] .

An alternative proof

The following proof is also inductive, but does not involve other polynomials than those symmetric in X1,...,Xn, and also leads to a fairly direct procedure to effectively write a symmetric polynomial as a polynomial in the elementary symmetric ones. Assume the symmetric polynomial to be homogenous of degree d; different homogeneous components can be decomposed separately. Order the monomials in the variables Xi lexicographically, where the individual variables are ordered X1>…>Xn, in other words the dominant term of a polynomial is one with the highest occurring power of X1, and among those the one with the highest power of X2, etc. Furthermore parametrize all products of elementary symmetric polynomials that have degree d (they are in fact homogeneous) as follows by partitions of d. Order the individual elementary symmetric polynomials ei(X1,…,Xn) in the product so that those with larger indices i come first, then build for each such factor a column of i boxes, and arrange those columns from left to right to form a Young diagram containing d boxes in all. The shape of this diagram is a partition of d, and each partition λ of d arises for exactly one product of elementary symmetric polynomials, which we shall denote by eλt (X1,…,Xn) (the "t" is present only because traditionally this product is associated to the transpose partition of λ). The essential ingredient of the proof is the following simple property, which uses multi-index notation for monomials in the variables Xi.

Lemma. The leading term of eλt (X1,…,Xn) is Xλ.

Proof. To get the leading term of the product one must select the leading term in each factor ei(X1,…,Xn), which is clearly X1X2Xi, and multiply these together. To count the occurrences of the individual variables in the resulting monomial, fill the column of the Young diagram corresponding to the factor concerned with the numbers 1…,i of the variables, then all boxes in the first row contain 1, those in the second row 2, and so forth, which means the leading term is Xλ (its coefficient is 1 because there is only one choice that leads to this monomial).

Now one proves by induction on the leading monomial in lexicographic order, that any nonzero homogenous symmetric polynomial P of degree d can be written as polynomial in the elementary symmetric polynomials. Since P is symmetric, its leading monomial has weakly decreasing exponents, so it is some Xλ with λ a partition of d. Let the coefficient of this term be c, then Pceλt (X1,…,Xn) is either zero or a symmetric polynomial with a strictly smaller leading monomial. Writing this difference inductively as a polynomial in the elementary symmetric polynomials, and adding back ceλt (X1,…,Xn) to it, one obtains the sought for polynomial expression for P.

The fact that this expression is unique, or equivalently that all the products (monomials) eλt (X1,…,Xn) of elementary symmetric polynomials are linearly independent, is also easily proved. The lemma shows that all these products have different leading monomials, and this suffices: if a nontrivial linear combination of the eλt (X1,…,Xn) were zero, one focusses on the contribution in the linear combination with nonzero coefficient and with (as polynomial in the variables Xi) the largest leading monomial; the leading term of this contribution cannot be cancelled by any other contribution of the linear combination, which gives a contradiction.

A Self-Contained Algorithmic Proof

The following proof of the existence (not of the uniqueness) of Q is the same as the above, but rewritten in elementary terms and with slightly different choice of lexicographic order.

The symmetric polynomial P(x_1,\ldots,x_n)\,\! is a sum of monomials of the form cx_1^{i_1}\ldots x_n^{i_n}\,\!, where the i_j\,\! are nonnegative integers and c\,\! is a scalar (i. e., an element of our ring A). We define a partial order on the monomials by specifying that

c_1x_1^{i_1}\ldots x_n^{i_n} < c_2x_1^{j_1}\ldots x_n^{j_n}\,\!

if c_2\ne0\,\! and there is some 0\leq k \leq n-1\,\! such that i_{n-l}=j_{n-l}\,\! for l=0,1,\ldots ,k-1\,\! but i_{n-k}<j_{n-k}\,\!. For instance 10x_1^2x_2^3x_3^4 < 2x_1^6x_2^4x_3^5 and 3x_1^2x_2^4x_3^5 < -7x_1^4x_2^5x_3^5. (You have probably realized that the coefficients c_1 and c_2 don't have any relevance in whether c_1x_1^{i_1}\ldots x_n^{i_n} < c_2x_1^{j_1}\ldots x_n^{j_n}\,\! or not, as long as they are nonzero. It is the exponents that matter.) In words, starting in the nth position in both monomials, go back until the two exponents are not equal. The monomial with the larger exponent in that position is the larger monomial. This is called a lexicographic order on the monomials.

We reduce P into elementary symmetric polynomials by successively subtracting from P a product of elementary symmetric polynomials eliminating the largest monomial according to this order without introducing any larger monomials. This way, in each step, the largest monomial becomes smaller and smaller until it becomes zero, and we are done: the sum of the subtracted-off polynomials is the desired expression of P as a polynomial function of elementary polynomials.

Here is how each step of this algorithm works: Suppose c x_1^{i_1}\ldots x_n^{i_n}\,\! is the largest monomial in P. Then we must have i_1 \leq i_2 \leq ... \leq i_n, since otherwise this monomial could not be the largest one of P (in fact, due to P being symmetric, the polynomial P must also have the monomial c x_1^{j_1}\ldots x_n^{j_n}\,\! where \left(j_1,j_2,\ldots, j_n\right) is the sequence \left(i_1,i_2,\ldots, i_n\right) sorted in increasing order; but this monomial is larger than the monomial c x_1^{i_1}\ldots x_n^{i_n}\,\! unless we have i_1 \leq i_2 \leq ... \leq i_n). Thus, we can define a symmetric polynomial R by


R = cs_1^{i_n-i_{n-1}}s_2^{i_{n-1}-i_{n-2}}\ldots s_{n-1}^{i_2-i_1}s_n^{i_1}\,\!

where s_k is the kth elementary symmetric polynomial in the n variables x_1,\ldots,x_n\,\!. Clearly R is a polynomial in the elementary symmetric polynomials. Now we claim that the largest monomial of R is c x_1^{i_1}\ldots x_n^{i_n}\,\!. To prove this, we notice that the largest monomial of 
R = cs_1^{i_n-i_{n-1}}s_2^{i_{n-1}-i_{n-2}}\ldots s_{n-1}^{i_2-i_1}s_n^{i_1}\,\!
is clearly equal to

 c\left(\text{largest monomial of }s_1\right)^{i_n-i_{n-1}}
 \cdot\left(\text{largest monomial of }s_2\right)^{i_{n-1}-i_{n-2}}
 \cdot \ldots
 \cdot\left(\text{largest monomial of }s_{n-1}\right)^{i_2-i_1}
 \cdot\left(\text{largest monomial of }s_n\right)^{i_1}
 = c\left(x_n\right)^{i_n-i_{n-1}}\left(x_{n-1}x_n\right)^{i_{n-1}-i_{n-2}}\ldots \left(x_2x_3...x_n\right)^{i_2-i_1}\left(x_1x_2...x_n\right)^{i_1}

(since the largest monomial of s_i is x_{n-i%2B1}x_{n-i%2B2}...x_n for every i).

In this monomial, the variable x_n\,\! occurs with exponent i_n (since it occurs with exponent i_n-i_{n-1}\,\! in the first term, i_{n-1}-i_{n-2}\,\! in the second term, and so on, down to i_1\,\! times in the final term), the variable x_{n-1}\,\! occurs with exponent i_{n-1}\,\! (since it occurs with exponent i_{n-1}-i_{n-2}\,\! in the second term, i_{n-2}-i_{n-3}\,\! in the third term, and so on, down to i_1\,\! times in the final term), and so on for the remaining variables. Hence, this monomial must be c x_1^{i_1}\ldots x_n^{i_n}\,\!. Thus we have shown that the largest monomial of R is c x_1^{i_1}\ldots x_n^{i_n}\,\!. Therefore, subtracting R from P eliminates the monomial c x_1^{i_1}\ldots x_n^{i_n}\,\!, and all monomials of P-R are smaller than the one just eliminated. Thus, we have found a polynomial R, which is a polynomial in the symmetric polynomials, such that subtracting R from P leaves us with a new symmetric polynomial P-R whose largest monomial is smaller than that of P. We can now continue the process until nothing remains in P.

Here is an example of the above algorithm. Suppose P(x_1,x_2)= (x_1 %2B 7x_1x_2 %2B x_2)^2\,\!. Expanding this into monomials, we get


P=x_1^2 %2B 2x_1x_2 %2B 14x_1^2x_2 %2B x_2^2 %2B 14x_1x_2^2 %2B 49x_1^2x_2^2.

The largest monomial is 49x_1^2x_2^2, so we subtract off 49s_2^2, getting


P-49s_2^2 = x_1^2 %2B 2x_1x_2 %2B 14x_1^2x_2 %2B x_2^2 %2B 14x_1x_2^2.

Now the largest monomial is 14x_1x_2^2, so we subtract off 14s_1s_2, getting


P-49s_2^2-14s_1s_2 = x_1^2 %2B 2x_1x_2 %2B x_2^2

Now the largest monomial is x_2^2 , so we subtract off s_1^2, getting


P-49s_2^2-14s_1s_2-s_1^2 = 0.

This gives


P(x_1,x_2) = 49s_2(x_1,x_2)^2%2B14s_1(x_1,x_2)s_2(x_1,x_2)%2Bs_1(x_1,x_2)^2.\,

See also

References